Correlations in weighted networks 
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We develop a statistical theory to characterize correlations in weighted networks. We define the 
appropriate metrics quantifying correlations and show that strictly uncorrelated weighted networks 
do not exist due to the presence of structural constraints. We also introduce an algorithm for 
generating maximally random weighted networks with arbitrary P(k,s) to be used as null models. 
The application of our measures to real networks reveals the importance of weights in a correct 
understanding and modeling of these heterogeneous systems. 
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In the current era of fast technological progress, het- 
erogeneous transport systems appear at the core of the 
last revolutionary advances. The information technology 
revolution represents maybe one of the most outstand- 
ing examples, with the Internet [l| factually reshaping 
the ways of social and economic interactions. The suc- 
cess of this revolution is, at the same time, intimately 
linked to the development of other infrastructures also 
involving transference. This is the case of the globalized 
transportation systems and, in particular, of the world- 
wide airport network 0, Q, which serves as a ground 
for the transport of people, goods, and even diseases Q| 
throughout the world in a very short time scale. Due 
to their profound and far-reaching impact, it is crucial 
to develop theoretical tools to increse our understanding 
of the large scale properties of these systems, which can 
help to take actions in their engineering against possible 
malfunction or jamming. 

Both the Internet and the worldwide air transportation 
system, and in general most heterogeneous transport sys- 
tems, can be represented as weighted complex networks 
(WCNs) 0, in which vertices stand for the elementary 
units composing the system and edges represent the inter- 
actions or relations between pairs of units. The latter are 
further characterized by a weight measuring the capacity 
or the amount of traffic in a particular connection [6(. 
Although the theory of unweighted complex networks, 
where edges are exclusively modulated as present or ab- 
sent, is today well established 0, 0, 0], there is not yet 
available an equivalent formalism for the weighted case 
and the present know ledg e comes from particular models 
of growing WCNs d Eg. This makes difficult to define 
suitable observables to characterize these systems prop- 
erly. For instance, several definitions of the basic corre- 
lation functions || have been suggested 0, El E3 > but it 



is not clear which of those provide the correct measures. 
And what is worse, no proper null model for the pres- 
ence of correlations has been proposed in order to com- 
pare with empirical data. Null models are particularly 
relevant in this context because heterogeneous networks 
usually display unavoidable structural correlations which 
can lead to a mistaken understanding of the principles 
that shape the system and its functionality |F| . 

In this paper, we fill this gap by introducing a rig- 
orous framework for the characterization of correlations 
in WCNs that allows to define proper measures. We 
shall see that, at the weighted level, strictly uncorre- 
lated networks do not exists due to structural constraints. 
Yet, our formalism enables to define an algorithm that 
generates maximally random WCNs with arbitrary lo- 
cal properties to be used as a null model with respect 
non-structural correlations. This algorithm corresponds 
to a weighted version of the random graph ensemble pro- 
posed by Chung and Lu EH- We also define correlation 
measures that filter out the structural constraints. As an 
example, we apply our formalism to the US airport sys- 
tem [l6| (USAN) , the scientific collaboration network |l7| 
(SCN), and the world trade web Ei (WTW). The infor- 
mation obtained reveals that weights, rather than the 
bare topology, rule the architecture of some of them. 

Unweighted networks can be fully characterized by 
means of a binary variable , taking the values a,j = 1 
when the edge between vertices i and j is present and 
otherwise. Relevant statistical topological properties 
can then be derived from this adjacency matrix, more 
specifically, the degree distribution P(k), defined as the 
probability that a vertex is connected to k other vertices, 
or degree correlations measured by the average degree of 
the nearest neighbors as a function of the vertex degree, 
knn(k) E3> an d the degree-dependent clustering coeffi- 
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cient c(k) [2(J, |21|. In the case of WCNs, edges have 
assigned a real or natural number Wij, representing the 
weight or intensity of the connection between i and j. 
Thus, apart from the vertex degree fcj, the presence of 
weights allows to define other significant properties, such 
as the vertex strength Sj 0,0]' given by s$ = J2j w ij> an d 
statistical distributions such as the strength distribution 
P(s), the average strength of vertices of degree k, s(k), 
or, in a more general way, the joint probability P(k, s) 
that a vertex has degree k and strength s, simultaneously. 
However, the strength alone is not enough to capture the 
weighted structure of vertices since the ratio s/k gives 
only the average weight per connection but says noth- 
ing about fluctuations around this average. Therefore, 
we need to introduce some measure of the fluctuations 
of weights of a given vertex. To this end, we use the 
disparity Y, defined as Y t — J2j( w v/ S i) 2 Now, our 
main hypothesis is that all vertices with the same de- 
gree, strength, and disparity, that is, characterized by 
the same vector variable a = (k,s,Y), are statistically 
equivalent, so that we can define P(ot) = P(k, s, Y) as 
the probability that a given vertex has degree k, strength 
s, and disparity Y. Without lack of generality, we will 
also assume that the strength is a discrete variable so 
that the equivalence classes form a numerable set. 

To quantify two-point correlations for weighted net- 
works, we start by defining two matrices ||. Let E a , a ' 
be the matrix accounting for the number of connections 
between the class of vertices a and the class of vertices a' 
(two times this number if the two classes are the same). 
Analogously, let W a , a i be the matrix that accounts for 
the weight between the same pair of classes. Let N, 
E, and W be the number of vertices, edges, and total 
weight of the network, respectively. Then, the funda- 
mental functions characterizing the two-point correlation 
structure in WCNs are 



P(a, a ) = — — : — 

V ' ' (k)N 



and Q(ot, a!) = 



(s)N 



(1) 



Both functions have a clear interpretation [8j. Indeed, 
(2 — S a . a ')P(a., a.') is the probability that a randomly 
chosen edge of the network connects two vertices of 
the classes a and ol . Analogously, (2 — 5 aia >)Q(a, a!) 
gives the probability that, when choosing an edge of the 
network with a probability proportional to its weight, 
this edge connects two vertices of the classes a and 
a! . These fundamental functions satisfy the summation 
rules ^2 a ,P(a,a') = kP(a)/(k) and £ a ,Q(a,a') = 
sP(ot) j (s) . This allow to define the relevant conditional 
probabilities 

(k)P{a,a') n ,n, (s)Q(a,a') 



kP(a) 



sP(a) 



As usual, P(a'\a) measures the probability that a ran- 
domly chosen edge from a vertex in the class a points to 



a vertex in the class a'. It is the equivalent for WCNs to 
the conditional probability P(k'\k) measuring the topo- 
logical correlations between nearest neighbors 0], but 
now with the extra information provided by the depen- 
dence on strength and disparity. The conditional proba- 
bility Q{a!\ot) measures the probability that, when ran- 
domly choosing a vertex in the class a and following one 
of its edges with probability proportional to its weight, 
the vertex at the other end belongs to the class a' . It is a 
pure measure for WCNs, relating the effect of the weights 
to the strength of the correlations. In a similar fashion as 
it is done in the case of unweighted networks, we can de- 
fine as a more practical correlation function, the average 
degree of the neighbors of vertices of degree a, but now 
weighted by the conditional probability Q(a'\ot), that 
is, k™ n (a) — J2a> k'Q(a'\a). This is still a three vari- 
ables function which is difficult to analyze. Therefore, 
we coarse grain the degrees of freedom corresponding to 
s and Y in the following way: 



P(a), 



iGV(fe) 



E 



Wijkj, (3) 



where the last term defines the numerical implementa- 
tion of this function. The summation over i involves all 
vertices with degree k, V(fc), and Nk is the number of 
vertices with that degree. We note that this measure 
coincides with the one proposed in Ref. 0- 

Turning now to three- vertex correlations, they are fully 
characterized by the three vertex conditional probabil- 
ity Q(a',a"\a), which measures the likelihood that a 
vertex a is simultaneously connected to vertices a! and 
a." when the weights of both connections are considered. 
In unweighted networks, the information about three- 
vertex correlations can be conveniently compacted in the 
degree-dependent clustering coefficient c(k). Similarly, 
for WCNs we can generalize a weighted clustering coeffi- 
cient as C™{a) = Y, a >,a" Q( a '> a "\ a ) r Z> a "> where r a'a" 

is the probability that two vertices in the classes a' and 
a." are joined, provided that they have a common neigh- 
bor in the class a.. Once again, we can integrate out the 
strength and disparity to obtain 



'(k) 



P(k) 



(4) 



which represents the natural generalization for WCNs of 
the clustering coefficient c(k). Numerically, this function 
is given by 

S " (fc) = ^v7 £ . 2 n _Y-^ WijWuajl - (5) 

Notice that this is different from the definition given in 
Ref. 0. 

The zero measure of correlations is given by the so- 
called uncorrelated network ensemble, defined as the en- 
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semble for which the joint distributions Eqs. factor- 
ize as P(a,a') = kk' P(a.)P{a')/ (k) 2 and Q{a,a!) = 
ss' P(oc)P(oc') I (s) 2 . In this case, one can easily prove 
that the measures defined above become 



(km) 

(s) 



and 



>(k) 



((k-l)s(k))' 
(k) 2 (s)N 



(6) 



where s(k) = J2 S y sP(ol) / P{k). We have also assumed 
that, for randomly assembled networks, Q(a' ,a"\a) = 
Q(a'\a)Q(ct"\ct). As one can see, all these functions be- 
come independent of the degree, so that any non-trivial 
dependence on k will signal the presence of two- and 
three-vertex correlations, respectively. 

In fact, one can realize that, for any WCN, the joint 
distributions P(et,a') and Q(a,a') cannot factorize ex- 
cept for large degrees. Consider, for instance, vertices of 
degree k = 1 and strength s. The neighbors of such ver- 
tices must have a strength that is, at least, s, meaning 
that the properties of the neighbor depends on the prop- 
erties of the first vertex. Vertices of degree k = 2 and 
strength s, have weights in their connections that are a 
fraction of s and, then, the strength of their neighbors 
should be, at least, the same fraction of s. The same 
effect is present, although in a weaker form, for vertices 
of higher degrees. Therefore, purely uncorrelated WCNs 
cannot exist. Just in the case of large degrees, this struc- 
tural correlations become very weak. 

The highest level of randomness attainable in 
WCN does not correspond to the factorization of 
Eqs. Jp) — which is impossible — but of their marginal 
distributions, P(k,k') = J2 S y s > y P( a i a ') an d 
Q(k,k') = J2 S y s' y Q(ot,ot'). We can then define 
the corresponding conditional probability Q(k'\k) = 
(s)Q(k,k')/s(k)P(k) and the two-vertex correlation 
function s (fc) = £ fc , k'Q(k'\k), which filters out the 
structural correlations. It is numerically computed as 



\k) 



N k 4rf. . s(k) 



Uijkj. 



(7) 



In this function, the contribution of every vertex i de- 
pends on the average strength of all the vertices with the 
same degree k. This implies an averaging that cancels out 
the effect of weight induced correlations and yields a con- 
stant behavior when the marginal distributions factorize. 
The same line of reasoning also applies to clustering. The 
non-structural weighted clustering coefficient reads 

c w ' ns (k) = 4- Y ^ =— Y Wijwgaji, (8) 



,s 2 (l — Y)(k) being an average over vertices of degree k. 

To check the accuracy of this approach, we need a null 
model as a gauge for the presence or absence of non- 
structural correlations. This will imply the construction 
of maximally random WCNs, which can be easily inferred 
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FIG. 1: Correlation measures for a random WCN generated 
with the algorithm denned in the text with P(k) ~ fc -2 ' 5 and 
s(k) oc k 1 - 5 . 



from the proposed formalism. The strategy consists in 
defining an ensemble at the hidden level where the lo- 
cal properties are fixed and where we can assume that 
the fundamental functions factorize || . Instead of work- 
ing with the joint distributions, it is more convenient to 



define the new quantities r a ^ a i and 
(fc)P(a.a') 



NP(a)P(a') z 



(s)Q(a,a') 
(k)P(a,a')' 



(9) 



The first specifies the ratio between the number of con- 
nections among two classes and its maximum possible 
number. The second corresponds to the average weight 
of an edge connecting two equivalence classes. Now, as- 
suming the factorization of the fundamental functions, 



and w aia i take the simple forms 
kk' 



(k)N' 



(k)ss' 
(s)kk r 



(10) 



a result implying that the topology of the network at the 
hidden level is decoupled from the weights and, more im- 
portantly, independent of the disparity. Using this result, 
we can generate a WCN without two-point correlations 
(other than the structural ones) in the following way: we 
first construct an uncorrelated network with a given de- 
gree distribution P(k) using any of the algorithms avail- 
able in the literature [a |23| . After the network has been 
assembled, we assign an expected strength to each ver- 
tex according to the distribution g(s\k), under the con- 
straint that P(k,s) = P(k)g(s\k). Finally, each edge is 
assigned a weight according to Eq. (|10|) . In this way, we 
can generate WCNs with any non-trivial correlation be- 
tween strength and degree and any form of the degree 
distribution. It is important to notice that, in principle, 
the expected and final strength of a vertex are not equal. 
However, one can prove that both quantities converge on 
average. 

In Fig^ we compare the weighted correlation func- 
tions with their unweighted counterparts for a WCN con- 
structed with the algorithm explained above. We observe 
that the weighted correlation functions are not flat, as 
they should be for an uncorrelated network, but show 
a degree dependence for small k, saturating to a con- 
stant plateau for large k. In contrast, the non-structural 
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FIG. 2: Correlation measures for real networks. From top to 
bottom, the US airport network (USAN) for the year 2005, 
the world trade web (WTW) for the year 2000, and the sci- 
entific collaboration network (SCN). 



functions recover the expected uncorrelated behavior in- 
dependent of k. 

Correlation measures for three different real networks 
are shown in Fig|2 The first observation is that, in gen- 
eral, weighted measures greatly disagree with the un- 
weighted ones, offering a completely different picture 
with respect the the bare topology. For the USAN and 
the WTW, the almost flat behavior proves that weighted 
two- and three-point correlations are extremely weak, in 
contrast to the unweighted measures which show impor- 
tant dependencies on k. This suggests that the under- 
standing of their formation processes or their modeling 
can be simplified by avoiding correlations at the weighted 
level. Besides, the noticeable difference between the 
weighted measures and its non-structural counterparts in 
the USAN graphs manifests that structural correlations 
are more important for this network. On the other hand, 
all measures follow a similar behavior in the SCN. How- 
ever, whereas the weighted two-point measure tells that 
the network is more assortative than the unweighted es- 
timation, the non-structural measure indicates that this 
is due to an structural effect since, except for very high 
degrees, k^ s (k) < k nn (k) < k™ n (k). This effect is even 



more evident in the case of clustering. The weighted 
measure proves that the tendency to form triangles is 
more important when weights are considered. However, 
the non-structural measure is significantly smaller that 
the unweighted one, which means that, when discount- 
ing structural effects, the tendency to form triangles is in 
fact less pronounced. 

Summarizing, we have shown that strict uncorrelated 
WCNs at the local level do not exist due to the presence 
of structural constraints. From a rigorous formal frame- 
work, we have defined the appropriate weighted corre- 
lation measures that quantify the overall level of corre- 
lations. We also propose complementary non-structural 
measures that filter out the structural component and 
quantify the level of correlations in the network as com- 
pared with the maximum randomness attainable. At this 
respect, we have introduced an algorithm that generates 
maximally random WCNs with an arbitrary P(k, s) to be 
used as null models. We have applied our formalism to 
analyze three different heterogeneous networks. The re- 
sults make evident the importance of taking into account 
weights to properly describe this class of systems. 
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